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(54) Apparatus and method for deriving an enhanced decoded reduced-resolution video signal 
from a coded high-definition video signal 



(57) An improved image processing system (204, 
205B, 205E. 206, 207) involves decoding compressed 
image data including frequency domain coefficients de- 
fining blocks of pixel values representing an image at a 
first resolution to provide an Image at a reduced second 
resolution for display from a selected sub-set of the fre- 



quency domain coefficients. The apparatus includes an 
enhanced motion-compensation-unit (MCU) (208) op- 
erating with blocks of pixel values representing an im- 
age at an intemnediate third resolution lower than the 
first resolution but higher than the reduced second res- 
olution. 
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< Description " 

[0001] The present invention relates to the decoding of a coded high-definition (HD) video signal to derive an en- 
hanced decoded video signal suitable, for example, for recording or producing a picture-in-picture (PIP) or other re- 

5 duced-resolutlon display. 

[0002] Known in the art are television receivers that, while displaying a relatively large picture derived from a primary 
television channel, also simultaneously display a small picture-in-picture (PIP) derived from a secondary television 
channel. In the case of a high-definition television (HDTV) receiver, the receiver must include a relatively complex and 
expensive decoderthat conforms with the MPEG IS0 13818-2 standard for decoding a received coded HD video signal 

10 in real time for high definition display. However, because the PIP Is small, there Is no need to provide a high definition 
PIP display because a viewer inherently would not be able to resolve the higher definition components of a high defi- 
nition PIP. Therefore, to provide the PIP, the HDTV receiver may be supplied with a lower-resolution second simpler 
and less expensive decoder which still conforms with the ISO 13818-2 standard. 

[0003] One approach, known in the art, to providing a lower-resolution second decoder which is somewhat simpler 
15 and less expensive than the decoder providing the high definition display, is disclosed in the three U.S. patents 
5,614,952, 5.614,957 and 5.635,985. which were, respectively, issued to Boyce et al. on March 25.1997, March 25,1 997 
and June 3, 1997. 

[0004] The teaching of copending U.S. patent application S.N. 09/349,866, filed July 8, 1999 and assigned to the 
same assignee as the present application is directed to a lower-resolution second-decoder approach suitable for de- 

20 riving a PIP display in real time from a received coded HD video signal that is significantly simpler and less expensive 
to implement than is the second decoder disclosed by Boyce et al, but still conforms with the ISO 13818-2 standard. 
[0005] A system Involves decoding compressed Image data including frequency domain coefficients defining blocks 
of pixel values representing an image at a first resolution to provide an image at a reduced second resolution. The 
system includes a motion-compensation-unit (MCU) processor responsive to a selected sub-set of the frequency do- 

25 main coefficients for deriving the image of the reduced second resolution. The motion-compensation-unit (MCU) proc- 
essor employs blocks of pixel values representing image data at an intermediate third resolution lower than the first 
resolution and higher than the reduced second resolution. 

BRIEF DESCRIPTION OF THE DRAWING 

30 

[0006] 

FIGURE 1 is a functional block diagram showing a variable-length decoder (VLD) responsive to an input HD MPEG 
data bit-stream for providing a first selected MPEG data output to a PIP decoding means and a second selected 

35 MPEG data output to an HD decoding means; 

FIGURE 1 a shows an 8x8 block containing the 64 DOT coefficients that are employed by the HD decoding means 
of FIGURE 1, FIGURE lb shows an 8x8 block containing the particular 10 DCT coefficients of the 64 DOT coef- 
ficients shown in FIGURE la that are employed by the PIP decoding means of FIGURE 1 for progressive-scan 
sequences and FIGURE Ic shows an 8x8 block containing the particular 10 DCT coefFlcients of the 64 DCT coef- 

40 ficients shown in FIGURE la that are employed by the PIP decoding means of FIGURE 1 for interlaced-scan 

sequences; 

FIGURE 2 is a simplified functional block diagram of an embodiment of the PIP decoding means of FIGURE 1 
which incorporates features of the present invention; 

FIGURE 3 is a functional block diagram showing details of the enhanced MCU processing means of FIGURE 2. 
45 FIGURE 4 is a conceptual diagram showing the computational processing performed by the DCT-based upsample 

means of FIGURE 3; and 

FIGURE 5 is a conceptual diagram showing the computational processing performed by the DCT-based down- 
sample means of FIGURE 3. 

50 [0007] Referring to FIGURE 1, there is shown VLD 100, PIP decoding means 102 and HD decoding means 104. In 
accordance with the known teaching of the MPEG ISO 1 3818-2 standard, one of the responses of VLD 100 to the input 
coded HD MPEG data comprising a sequence of MPEG I, P and B frames is to convey coded picture information 
defined by each of successive 8x8 blocks of quantized discrete cosine transfomfi (DCT) coefFlcients as an input to HD 
decoding means 104. Further, in accordance with the known teaching of the MPEG ISO 13818-2 standard, among the 

55 functions performed by HD decoding means 104 is to first perfomri inverse quantization of each successive 8x8 block 
of DCT coefficients and then perform Inverse discrete cosine transfomnation (IDCT) of the DCT coefficients of each 
successive 8x8 block. Finally, HD decoding means 104 must perform motion compensation for each P frame and bi- 
directionally predictive B frame after IDCT has been performed on that P or B frame. 
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[0008] FIGURE la shows an 8x8 block of DCT coefficients, wherein (1) the value of coefficient DCTq o (located in 
the upper left corner of the 8x8 block) represents the average (DC) value (i.e., both the horizontal and vertical frequen- 
cies are 0) of the picture defined by the 64 values of a corresponding 8x8 block of pixels prior to having undergone 
DCT, while (2) the value of coefficient DCT7 7 (located in the lower right corner of the 8x8 block) represents the highest 
horizontal frequency and highest vertical frequency components of the picture defined by the 64 values of a corre- 
sponding 8x8 block of pixels prior to having undergone DCT. For the case of a HD picture, all, or nearly all, of the 64 
DCT coefficients from DCTq 0 to DCT7 7 inclusive of FIGURE 1a may have non-zero values. This results In a relatively 
large amount of image-processing computation to accomplish IDCT in real time. Further, motion compensation also 
involves a large amount of real time image-processing computation. Therefore, HD decoding means 104 requires 
about 96 Mbits memory to temporarily store MPEG decoded image frames prior to display. HD decoding means 104 
requires these frames for motion compensation to reconstruct accurate images for display. Thus, a physical Implemen- 
tation of HD decoding means 104 is relatively expensive. 

[0009] Returning to FIGURE 1 , another of the responses of VLD 1 00 to the input coded HD MPEG data is to convey 
only coded picture information defined by a relatively small given number of lower-frequency-defining, quantized DCT 
coefficients of each successive 8x8 block as an input to PIP decoding means 1 02. It is to be noted that the PIP process- 
ing and images and the term PIP itself is used herein to encompass any form of reduced resolution image and process- 
ing and not just television PIP image generation. While the preferred tutorial example of the PIP decoding means 
described in the aforesaid patent application S.N 09/349,865 employed only the 6 lowest-frequency quantized DCT 
coefficients, the prefen-ed tutorial example of enhanced-quality PIP decoding means 102, described In detail below, 
employs 10 DCT coefficients consisting of DCTq q. DCT^ DCT2 0, DCT3 0, DCTq^, DCT^ 1, DCT2 1, DCT0 2. DCT-, 2. 
and DCTo^3 shown in FIGURE 1b for progressive-scan use or, alternatively, consisting of DCTq^c DCT-, q, DCT2 0. 
DCTqj, DCT-, j, DCTq 2. DCTq 3, DCTq ^, DCTq 5, and DCTq e shown in FIGURE 1c for interiaced-scan use, thereby 
providing better high-frequency response for the enhanced PIP display. More specifically, the PIP bit-stream received 
by VLD 100 has been pre-parsed by a VLD-PIP parser (not shown to simplify Figure 1) to remove from the bitstream 
DCT coefficients which are not needed by the PIP decoder. 

[0010] The simplified functional block diagram of the embodiment of enhanced-quality PIP decoding means 102 
shown in FIGURE 2 comprises runlength decoder (RLD) 200, inverse quantizer (IQ) 202, unitary enhanced IDCT, 
filtering and pixel-decimation processing means 204, base-layer adder 205B, enhancement-layer adder 205E, base 
and enhancement-layer decimated pixel memory 206, enhancement-layer encoder 207, enhanced motion compen- 
sation unit (MCU) processing means 208 and sample-rate converter 210. Although the simplified functional block di- 
agram of FIGURE 2 does not show means for controlling the operation of this embodiment of enhanced-quality PIP 
decoding means 102, it should be understood that suitable control means that conform with the requirements of the 
ISO 13818-2 standard is included in a physical implementation of this embodiment. 

[0011] For illustrative purposes, the following description of elements 200, 202, 204, 205B, 205E, 206, 207, 208 and 
210 assumes that each of these elements is being operated in accordance with the above-discussed preferred tutorial 
example. 

[0012] In this example, RLD 200 outputs 10 DCT coefficients for each 8x8 coded block using the 2 scan patterns 
defined in the ISO 13818-2 standard. The positioning of the 10 DCT coefficients within each 8x8 block is illustrated in 
Figure lb for progressive scan and in Figure Ic for interiaced scan, as detemnined by the state of the 
progresslve_sequence flag. In the Figure lb progressive sequence case, if the alternate_scan flag from the picture 
coding extension for the current picture is 0, the 10 DCT coefficients correspond to coefficients 0,1,2,3,4,5,6,7,8,9 in 
1 -dimensional scan order, whereas if alternate_scan is 1, the 10 DCT coefficients of interest are coefficients 
0,1 ,2,3,4,5,6.7,8,20 in scan order. In the Figure Ic interiaced sequence case, if the alternate_scan flag from the picture 
coding extension for the current picture is 0, the 10 DCT coefficients correspond to coefficients 0,1 ,2,3,4,5,9,10,20,21 
in 1 -dimensional scan order, whereas if alternate_scan is 1, the 10 DCT coefficients of interest are coefficients 
0,1 ,2,3,4,5,6,10,11 ,12 in scan order. There are two run values that have a meaning in RLD 200 which is different from 
that described in the IS0 13818-2 standard, depending on the value of the altemate_scan and progressive_sequence 
flags. For progressive sequences, if altemate.scan is 0, a run value of 10 indicates that the coefficients needed by 
PIP decoder 102 are all 0 and there is no subsequent non-zero coefficient. Similariy, if alternate_scan Is 1 , a run value 
of 21 indicates that the coefficients needed by decoder 102 are all 0 and there is no subsequent non-zero coefficient. 
For interiaced sequences, if alternate_scan is 0, a run value of 22 Indicates that the coefficients needed by PI P decoder 
102 are all 0 and there is no subsequent non-zero coefficient. Similariy, if altemate_scan is 1 , a run value of 13 indicates 
that the coefficients needed by decoder 102 are all 0 and there is no subsequent non-zero coefficient. Table 1, sum- 
marizes the meaning of run values of 10 and 21 for the two possible values of the alternate_scan flag for progressive 
sequence and Table 2, summarizes the meaning of run values of 13 and 22 for the two possible values of the 
altemate_scan flag for interlaced sequence. All other alternate_scan/run value combinations encountered by RLD 200 
are interpreted as described in the ISO 13818-2 standard. 
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Table 1- 



5 



10 



Interpretation of run = 10 and run = 21 by RLD 200 for progressive sequences 


Run 


Alternate_Scan 


Interpretation in PIP RLD 


10 


0 


All DCT coefficients = 0 


10 


1 


Same as ISO 13818-2 standard 


21 


0 


Not allowed 


21 


1 


All DCT coefficients = 0 



Table 2- 



Interpretation of run = 13 and run = 22 by RLD 200 for interlaced sequences 


Run 


Altemate.Scan 


Interpretation in PIP RLD 


13 


0 


Same as ISO 13818-2 standanj 


13 


1 


All DCT coefficients = 0 


22 


0 


All DCT coefficients = 0 


22 


1 


Not allowed 



[001 3] IQ 202 performs inverse quantization arithmetic and saturation described in the ISO 1 381 8-2 standard on the 
1 0 DCT coefficients shown in FIGURE 1 b for progressive sequences and shown in FIGURE Ic for interlaced sequences. 
The mismatch control portion of the inverse quantization process is not needed. Conventionally, an extensive compu- 
tational process requiring three separate steps is needed to convert the coded frequency domain information in an 8x8 
block at the output from IQ 202. into spatial domain picture infomnatlon comprising respective values of a smaller block 
of decimated pixels of a reduced-resolution PIP display image. The first step is to determine the value of each of the 
64 (i.e., full pixel density) pixel values of each 8x8 block of picture information as an IDCT function of the inversely- 
quantized DCT coefficient values. Thereafter, the second step of lowpass filtering followed by the third step of pixel 
decimation may be performed on the pixels in each successive 8x8 block to provide the desired smaller block of 
decimated pixels. For instance, a decimation of alternate horizontal and vertical filtered pixels for the case of a pro- 
gressive scan would result in a 75% reduction in pixel density. Similarly, a decimation of 3 out of 4 successive filtered 
pixels in the horizontal direction for the case of an interlaced scan would also result in a 75% reduction in pixel density. 
Thus, in either case, such a decimation performed for luma pixels and also for chroma pixels would result in a reduction 
in pixel density from 64 per 8x8 block to only 16 per 8x8 block for each of them. However, the amount of hardware 
required to implement this conventional three-step computational process is relatively large and, therefore, relatively 
expensive. 

[0014] In accordance with the aforesaid patent application S.N, 09/349,865, the unitary IDCT, filtering and pixel- 
decimation processing means disclosed therein is able to convert the coded respective values of inversely-quantized 
DCT coefficients contained in an 8x8 block at the output from 10 202, into a smaller block of decimated pixels in a 
single-step computational process. Thus, the amount of hardware required to implement this single-step computational 
process by means 204 is relatively small and, therefore, relatively inexpensive compared to the aforesaid conventional 
three-step computational process. 

[0015] Specifically, in accordance with patent application S.N. 09/349,865, the decimated pixel memory thereof 
(which, because of pixel decimation, requires a storage capacity size of only 1/4 the capacity size of a corresponding 
undedmated pixel memory) comprises a plurality of separate buffers. Each of these buffers is capable of temporarily 
storing decimated luma and chroma pixels. In conformity with the ISO 13818-2 standard, the decimated pixel memory 
includes one or more buffers for storing decimated pixels that define reconstructed intracoded (I), predictive-coded (P) 
and/or bi-directionally predictive-coded (B) frame or field pictures. Further, motion-compensated prediction macroblock 
output of pixel values from the MCU processing means is added in an adder to each corresponding macroblock output 
derived in the unitary IDCT, filtering and pixel-decimation processing means. The summed pixel values of the output 
from the adder are stored into a first buffer of the decimated pixel memory. This first buffer may be a first-in first-out 
(FIFO) buffer in which the stored decimated pixels may be reordered between (1 ) being written into the first buffer and 
(2) being read out from the first buffer and written into another buffer of the decimated pixel memory. In the case of a 
current P or B frame or field, the decimated pixel memory includes a buffer for storing a macroblock for Input to the 
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MCU processing means to provide motion compensation. 

[0016] In accordance with the principles of the present invention, two layers of decimated pixels are advantageously 
stored respectively in base and enhancement-layer decimated pixel memory 206 to achieve high-quality motion com- 
pensation and improve PIP image quality. The first of these two layers is a base-layer of decimated pixels and the 

5 second of these two layers is an enhancement-layer of vector-quantized values of iuma macroblock decimated pixels 
that are employed in enhanced MCU processing means 208 during decoding of P pictures The enhanced layer is used 
to provide a reduced resolution image of greater resolution than Is obtainable by using just the decimated pixels of the 
base-layer. Both the base-layer and this enhancement-layer are employed by enhanced MCU processing means 208, 
in a manner described in detail below. 

10 [0017] Preferred embodiments of IDCT, filtering and pixel-decimation processing means 204, enhancement-layer 
encoder 207 and enhanced MCU processing means 208 for implementing the present invention will now be described 
in detail. 

[0018] The unitary enhanced IDCT. filtering and pixel-decimation processing means 204 provides the following sets 
of 16 decimated pixel values for each of the base and enhancement-layers used for each of progressive scan and 
15 interiaced scan (each set being a function of 10 DCT coefficient values). 

PROGRESSIVE-SCAN, BASE-LAYER SET OF DECIMATED PIXEL VALUES 

[0019] 

20 

g^(O.O) = [8DCTq 0 + 10DCT., 0 + 7DCT2 0 + 4DCT3 0 + IODCTq + 13DCT^^ + ODCTg^^ 
+ 7DCTo 2 + 9DCT-, 2 + 4DCTq 3]/64 

25 

g^(1.0) = [ODCTp 0 + 4DCTi 0 - 7DCT2 0 - ODCTg^ + IODCTq + SDCT^ - ODCTg 
+ 7DCTo 2 + 4DCT., 2 + ^DCTq 3]/64 

30 

Q^{2fi) = [ODCTqq - 4DCT^ 0 - 7DCT2 0 + 9DCT3 0 + IODCTq^ - SDCT^^ - 9DCT2^^ 
+ 7DCTo 2 - 4DCTi 2 4DCTo 3]/64 

35 

gi(3,0) = [8DCT0 Q - lODCT^ 0 + 7DCT2 0 - 4DCT3 0 + IODCTq^ - 13DCTi^ + 9DCT2j 
+ 7DCTo 2 - 9DCT-, 2 + 4DCTo 3]/64 

40 

g^(0,1) = [8DCT0 0 + lODCT^ 0 + 7DCT2Q + 4DCT30 + 4DCTq^ + SDCT^^ + 4DCT2^ 
■ ^DCTq 2 - 9DCT^ 2 - 9DCTo 31/64 

45 

gi(1.1) = [8DCT0 0 + 4DCTi 0 - 7DCT2 0 - 9DCT3 0 + 4DCTo^ + 2DCT^ - 4DCT2 
-7DCTq 2 - 4DCT^ 2 " 9DCTo 3]/64 

50 

g^(2.^) = [8DCT00 - 4DCT^ 0 - 7DCT20 + 9DCT3 ^ + 4DCTo^ - 2DCT^^ - 4DCT2 
- 7DCTq 2 + 4DCT^ 2 ■ 9DCTq 3]/64 

55 

g^(3.1) = [8DCT00 - IODCT1 0 + 7DCT20 - 4DCT30 + 4DCTo^ - SDCT^^ + 4DCT2^i 
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• ' - 7DCTo 2 + 9DCT, 2 - 9DCTo 3J/64 

g,(0,2) = [8DCT(, o + 10DCT, (, + /DCTj q + 4DCT30 - 4DCT(, , - SDCT, , - 4DCT2 ^ 

- 7DCTo_2 - 9DCTi 2 + 90C\^y6A 

g,(1 ,2) = [8DCT0 0 + 4DCT, 0 - 7DCT2 q • 9DCT3 ^ - 4DCTo , - 2001^ , + 4DCT2 ., 

- 7DCT(, 2 - 4DCT, 2 + 9DCTo 3]/64 

9^(2,2) = [SDCTo 0 - 4DCTi 0 - 7DCT20 + ^OCJ^^ - 4DC\^ + 2DCT^ , + 4DCT2 ., 

- 7DCTo 2 + 4DCT^ 2 * 9DCTo 3]/64 

9,(3,2) = [8DCT0 0 - IODCT1 0 + 7DCT2 0 - 4DCT3 0 - 4DCTo , + SDCT, 1 - 4DCT2 , 

- 7DCTo2 + 9DCT, 2 + 9DCTo3]/64 

9,(0.3) = [8DCT00 + lODCT^, + 7DCT20 + 4DCT30 - IODCT0 , - 13DCT, , - 9DCT2 , 
+ 7DCTo 2 + 9DCT, 2 - 4DCTo 3]/64 

9,(1,3) = [SDCTp o + 4DCT, (, - 7DCT20 - 9DCT3 0 - IODCTq , - 5DCT, , + 9DCT2 , 
+ 7DCTo 2 + 4DCT, 2 - 4DCTo 3]/64 

9,(2,3) = [SDCTp 0 - 4DCT, 0 - 7DCT20 + 9DCT3 ^ - IODCTq , + 5DCT, , + 9DCT2 , 
+ 7DCTo2 - 4DCT, 2 - 4DCTo 3]/64 

9,(3,3) = [SDCTq o - 10DCT, 0 + 7DCT2 0 - ^DCJ^ f^ - lODCTp , + 13DCT, , - 9DCT2 , 
+ 7DCTo^ - 9DCT, 2 - ADCTq^M 

PROGRESSIVE-SCAN. ENHANCEMENT-LAYER SET OF DECIMATED PIXEL VALUES 
[0020] 

9o (0,0) = [ODCTq o + ODCT, 0 + ODCTj o + ODCT3 0 + IDCTg , + 1DCT, , + IDCT2 , 
+ 3DCTo 2 + 4DCT, 2 + 6DCTo 3]/64 

9o (1,0) = [ODCTq o + ODCT,_o + 0DCT2,o + ODCT3 0 + IDCT^ , + ODCT, , - IDCT2 , 
+ 3DCTo 2 + 2DCT, 2 + SDCTq 3]/64 
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' 90 (2.0) = [ODCTq o + ODCT, 0 + ODCTjo + ODCTjo + IDCTq i + ODCT, , - IDCTj ., 
+ 3DCTo 2 - 2DCT^ 2 + 6DCTo 3]/64 

Qo (3.0) = PDCTq o + ODCT, 0 + ODCTj q + ODCTg o + IDCTq , - IDCT,^ + IDCTj , 
+ SDCTq 2 - 4DCT, 2 + 6DCT0 3]/64 

Qo (0.2)= [ODCTo 0 + ODCT^ 0 + ODCTj q + ODCT3 0 + 2DCTo , + 3DCT, ^ + 2DCT2 , 
+ SDCTq 2 + 4DCT^ 2 - 2DCTo 3]/64 

Qq (1.2) = [ODCTo o + ODCT^ 0 + ODCTg Q + ODCT3 ^ + 2DCTo^ + 1DCT, , - 2DCT2 , 
+ 3DCTo 2 + 2DCT^ 2 " ^DCTq^VM 

(2.2) = [ODCToo + ODCT, 0 + ODCTj q + ODCT30 + 2DCTo , - IDCT, ^ - 2DCT2 , 
+ 3DCT0 2 - 2DCTi 2 - 2DCT(, 3]/64 

Qo (3.2) = [ODCTq o + ODCT^ 0 + ODCT2 „ + ODCT3 0 + 2DCTo ^ - 3DCT,^ + 2DCT2 , 
+ 3DCTo 2 - 4DCTi 2 - 2DCTo 3]/64 

Oo (0.4) = [ODCTq 0 + ODCT^ 0 + ODCTj g + ODCT3 „ + 2DCTo^ + SDCT^ , + 2DCT2 , 

- SDCTq 2 - 4DCT^ 2 ■ 2DCTo 3J/64 

9o ('••'*) = [ODCToo + ODCT^ 0 + ODCT20 + ODCT30 + 2DCTo , + IDCT, , - 2DCT2 , 

- 3DCTo 2 - 2DCT, 2 - 2DCTo 3]/64 

Qq (2.4) = [ODCTq o + ODCT^ 0 + ODCT2 0 + ODCT3 0 + 2DCTo ^ - 1DCT^ , - 2DCT2 , 

- 3DCTo 2 + 2DCT, 2 - 2DCTo 3]/64 

Qo (3.4) = [ODCTq o + ODCT1 0 + ODCT2_o + ODCT3 0 + 2DCTo ^ - 3001, , + 2DCT2 , 

- 3DCTo 2 + 4DCT, 2 - 2DCTo 3]/64 

Qo (0.6) = [ODCTo o + ODCT, 0 + ODCTj q + ODCT3 0 + IDCTg ^ + IDCT,^ IDCTg , 

- SDCTq 2 - 49DCTi 2 + 6DCT0 3]/64 

90 (1.6) = [ODCTqo + ODCT^ 0 + ODCTjo + ODCT30 + IDCTqi + ODCT, , - IDCTj ^ 



10 



IS 



20 



25 



30 



35 



40 



45 



50 



55 



EP1 054 566A1 

- SDCTq 2 - 2DCT, 2 + 6DCT0 3]/64 

Qo (2.6) = [ODCTo o + ODCT, 0 + ODCTgQ + ODCTj q + IDCTj , + ODCT, , - IDCTj ^ 

- SDCTq 2 + 2DCTi 2 + 6DCT0 3]/64 

Qq (3,6) = [ODCTqo + ODCT, 0 + ODCTjq + ODCT30 + IDCTq , - IDCT^ ^ + IDCT2 , 

- 3DCTo 2 + 4DCTi 2 + 6007^^64 

INTERLACED-SCAN. BASE-LAYER SET OF DECIMATED PIXEL VALUES 
[0021] 

9^(0.0) = [8DCT00 + 7DCTi 0 + IIDCT0 1 + 10DCT, , + IODCT02 + ODCTj,, + QOCTq^ 
+ 8DCT(, 4 + 6DCT0 5 + 4DCTo g]/64 

g, (1,0) = (8DCT0 0 - 7DCT, 0 + IIDCT0 ^ - 10DCT, 1 + 10DCT(, 2 + ODCTj q + SDCTo g 
+ 8DCT0 4 + 6DCT0 5 + 4DCTo e]/64 

(0,1) = [8DCT0 0 + TDCT, 0 + 9DCTq^ + 9DCT,^ + 4DCTo 2 + ODCT2 q - 2DCTo 3 

- 8DCT04 - IIDCT05 - IODCTq 61/64 

9,(1,1) = [8DCT0 0 - 7DCTi 0 + 9DCTo , - 9DCT, , + 4DCTo2 + ODCT20 - 2DCTo 3 

- 8DCT(, 4 - 1 1 DCTq 5 - IODCT0 el/64 

g, (0,2) = [8DCT(, 0 + /DCT, „ + ODCTq , + 6DCT, , - 4DCT{, 2 + ODCT2 0 - HDCTq 3 

- 8DCT0 4 + 2DCTo 5 + IODCTq el/64 

g, (1.2) = [8DCT0 0 - 7DCT, 0 + eOCTp , - 6DCT, , - 4DCT0 2 + ODCT2_o - IIDCT0 3 

- 8DCT0 4 + 2DCTo 5 + IODCTq 61/64 

g, (0,3) = [8DCT0 0 + 7DCTi 0 + 2DCTo , + 2DCT, , - lODCTo j + ODCT2_o - 6DCT0 3 
+ 8DCTe 4 + 9DCTo g - 4DCTo el/64 

g, (1,3) = [8DCTo_o - 7DCTi „ + 2DCTo , - 2DCT, , - 10DCT(, 2 + ODCT2.0 - 6DCT0 3 
+ 8DCT0 4 + 9DCTo 5 - 4DCTo 61/64 
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' 91 (0.4) = [8DCT00 + ZDCT, 0 - 2DCTo , - 2DCTi , - IODCT02 + ODCT2,, + 6DCT03 
+ 8DCT0 4 - 9DCTo 5 - 4DCTo,6]/64 * 

5 

(1.4) = [8DCT0 0 - 7DCTi 0 - 2DCTo^ + 2001^ , - IODCT0 2 + ODCTj „ + 6DCT03 
+ BDCTq 4 - 9DCTo 5 - 4DCTo 6]/64 

10 

g, (0.5) = [8DCT0 0 + 7DCT^ 0 - BDCTq ^ - 6DCT, , - 4007^ 2 + ODCTg q + HDCTq 3 

- 8DCT(, 4 - 2DCTo g + 10DCTo 6]/64 

15 

g,(1.5) = I8DCT0 0 - 7DCT^ 0 - 6DCTo^ + BDCT^ ^ - 4DCTo 2 + ODCTj o + 11DCT(, 3 

- 8DCT0 4 - 2DCTo 5 + 10DCTo.6]/64 

20 

g, (0.6) = [8DCT0 0 + /DCT, 0 - 9DCTo , - 9001^^ + 4DCTo 2 + ODCTj + 2DCTo 3 

- 8DCT04 + IIDCT05 - 10DCTo6]/64 

25 

g, (1,6) = [8DCT0 0 - 7DCT, 0 - 9DCT(, , + 9001, ^ + 4DCTj, 2 + ODCTj o + 2DCT0 3 

- 8DCT5 4 + 1 1 DCTj, 5 - IODCTq g]/64 

30 

g, (0.7) = (8DCT00 + 7DCT^ 0 - IIDCT0 , - 10DCT, ^ + IODCTq j + ODCTjo - 9DCT03 
+ 8DCT0 4 - 6DCT0 5 + 4DCTo 6]/64 

35 

g-i (1.7) = I8DCT00 - 7DCT^ 0 - IIDCTq, + lODCT^^ + IODCTq j + ODCTjq - 9DCT03 
+ 8DCTo_4 - 6DCTo_5 + ADC\^y64 

40 

UNTERLACED-SCAN. ENHANCEMENT-LAYER SET OF DECIMATED PIXEL VALUES 

Qo (0.0) = IODCTq o + 3DCTi 0 + ODCTq , + 4DCT^^ + ODCTg 2 + 7DCT20 + ODCT0 3 
+ ODCTq 4 + ODCTq 5 + ODCTq 6]/64 

go (2.0) = [ODCTq 0 + 3DCT^ 0 + ODCTo ^ + 4DCT^ ^ + ODCT0 2 - 7DCT2,o + ODCTq 3 
+ ODCTq 4 + ODCTq 5 + ODCTq 6]/64 

go (0.1) = [ODCTq 0 + 3DCT^ 0 + ODCTq^ + 4DCTi^ + ODCT0 2 + 7DCT2.0 + ODCTq 3 
+ ODCTq 4 + ODCTq 5 + ODCTq 6]/64 



[0022] 

45 



50 
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■ (2.1) = [ODCTqo + 3DCT^ (, + ODCTg^ + 4DCT, ^ + ODCTq j - 7DCT2_(, + ODCTq j 
+ ODCTq 4 + ODCTq 5 + ODCTq e]/64 

Oo (0.2) = [ODCTjo + 3DCTi 0 + ODCTp , + 2DCTi ^ + ODCTj, g + 7DCT2_o + ODCT03 
+ ODCTq 4 + ODCTq 5 + ODCTp g]/64 

go (2.2) = [ODCT0.0 + 3DCTi 0 + ODCTq , + 2DCTi ,+ ODCTo j - 7DCT2.0 + ODCT0 3 
+ ODCTq 4 + ODCTq 5 + ODCTq 6]/64 

Qo (0.3) = [ODCTq o + 3DCTi 0 + ODCTq^ + 1DCT^^ + ODCTo j + 7DCT2_o + ODCT^ ^ 
+ ODCTq 4 + ODCTq 5 + ODCTq 6]/64 

Qo (2.3) = [ODCTo o + 3DCTi 0 + ODCTq , + 1DCT, ^ + ODCT0 2 - 7DCT2 0 + ODCT^ ^ 
+ ODCTp 4 + ODCTq 5 + ODCTo 61/64 

90 (0.4) = [ODCTq o + 3DCTn, + ODCTq , - IDCT, , + ODCTq 2 + 7DCT2 0 + ODCTq j 
+ ODCTq 4 + ODCTq 5 + ODCTq 61/64 

go (2,4) = [ODCTq o + 3DCTi 0 + ODCTg , - IDCT^ , + ODCT0 2 - 7DCT2 0 + ODCTo g 
+ ODCTq 4 + ODCTq 5 + ODCTo 61/64 

go (0.5) = [ODCTo o + 3DCTi 0 * ODCTq^ - 2DCT,^ + ODCTq 2 + 7DCT2 0 + ODCTo 3 
+ ODCTo 4 * 0°CTo,5 * ODCTo,6l/64 

go (2.5) = [ODCTo o + 3DCTi 0 + ODCTo ., - 2DCTi , + ODCTq 2 - 7DCT2_o + ODCTq ^ 
+ ODCTo 4 + ODCTq 5 + ODCTo 61/64 

go (0.6) = [ODCTo 0 + 3DCTi 0 + ODCTq , - 4DCTi , + ODCTq j + 7DCT2 q + ODCTo 3 
+ ODCTo 4 + ODCTo 5 + ODCTq 61/64 

90 (2.6) = [ODCTo o + 3DCTi 0 + ODCTq^ - 4DCT,^ + ODCTq 2 - 7DCT2 q + ODCTq ^ 
+ ODCTo 4 + ODCTq 5 + ODCTo_6l/64 

9o (0-7) = [ODCTo o * 3DCTi 0 + ODCTo , - 4DCT,^ + ODCTo 2 + 7DCT2,o + ODCT0 3 



EP1 054 566 A1 



+ ODCTq 4 + ODCTq 5 + ODCTq 61/64 

5 Qo (2,7) = [ODCTq 0 + 3DCT^ ^ + ODCTq^ - 4DCT^ , + ODCTq 2 - 7DCT2,o + ODCTq 3 

+ ODCTq 4 + ODCTq 5 + ODCTq 6]/64 

[0023] Each of the above "Progressive-Scan Set of Decimated Pixel Values" and above "Interlaced-Scan Set of 
10 Decimated Pixel Values" was derived in the following manner: 

1 , If DCTy denotes the DCT coefficient with horizontal frequency index u and vertical frequency index v, then the 

IDCT equation which would be used to decode a blocic denoted f(x,y) at full resolution (where x = 0, N-1 ; y = 

0. N-1) is given by 



15 



20 



2N-IN-I ^^^^^^^^ (2x+l)u)i 2y+ )vji 
f(x.y)=— I I C(u)C(v)DCTu,vCOs ' ' cos ^ \ ' 
N u=0 v=0 2N 2N 



(1) 



2. Using only the 10 DCT coefficients shown in FIGURE lb, gives the approximation equation 2 for progressive- 
scan sequences 



25 



30 



3S 



f(x,y)' 



N 



2 

1 
1 



(2x + l)ji^ 



(2x + l)2ic 1 



(2x + l)3it 



2N V2 "'J'"'" 2N 

72 "'^ 2N *^M,1^"* 2N 2N 

1 (2y+l)2jt (2x + l)2ii (2y+l)» 

^ 2N ''•^ 2N 2N 



nrr (2x + l)7t 

ucr| — — cos 



2N 



(2y.l)2« ^ 1 



2N 



OJ 



cos 



(2y + l)3ff 
2N 



(2) 
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3. Using only the 10 DCT. coefficients shown in FIGURE Ic, gives the approximation equation 3 for interlaced-scan 
sequences 



45 



50 
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4; Let the right-hand side of each of equations 2 and 3 be denoted f(x,y). In the case of a progressive scan (i.e., 
the progressive_sequence frag is 1), the base-layer value gi*(x,y) is computed in accordance with the following 
equation 4 and the enhancement-layer value go'(x>y) is computed in accordance with the following equation 5: 

5 

gl\x,y) = irf ax,2y) + f\2x + l,2y) + f ax,2y + l) + f\2x+ (4) 



10 forx = 0 .3; y = 0 3, 

go'('^'y)=|[fV2x,y)+f\2x + l,y).f\2x,y + l)-f\2x + l,y + l)^ (5) 

15 

forx = 0 3;y = 0, 2, 4, 6. 

More specifically, gi'(x,y) in equation 4 defines the average value of the values of a set of 4 contiguous pixels 
(or prediction errors) an-anged in a 2x2 block portion of the full-resolution 8x8 blocl^. The value go'(x,y) in equation 
5 defines the difference between the average value of the values of a first set of 2 contiguous horizontal pixels (or 

20 prediction errors) of one vertical line and the average value of the values of a second set of 2 contiguous horizontal 

pixels (or prediction errors) of the following vertical line arranged in a 2x2 block portion of the full-resolution 8x8 
block. The 16 equations gi(0,0) to gi(3,3) of the above "Progressive-Scan, Base-layer Set of Decimated Pixel 
Values" were derived by substituting equation 2 into equation 4, substituting numeric values for x and y in g^^x, 
y), substituting N = 8, and approximating the weighting factors for the DCT coefficients with rational values. The 

25 16 equations go(0,0) to go(3,6) of the above "Progressive-Scan, Enhancement-layer Set of Decimated Pixel Values" 

were derived in a similar manner by substituting equation 2 into equation 5, substituting numeric values for x and 
y in go'(x.y), substituting N = 8, and approximating the weighting factors for the DCT coefficients with rational 
values. Although the effective pixel decimation of the enhancement-layer is only 2 (rather than being the effective 
pixel decimation of 4 of the base-layer), the equalities go(x,y+1 ) = -go(x,y) hold for y = 0,2,4,6, so that enhancement- 

30 layer values with odd vertical indexes need not be computed. Thus, only 16 independent go(x,y) enhancement- 

layer values need be computed for each 8x8 luma block in a progressive-scan I or P picture. Further, because 
these 16 go(x,y) enhancement-layer values are residual values, they tend to have a small dynamic range. 

[0024] In the case of an interlaced scan (i.e., the progressive_sequence flag is 0), the base-layer value g^Xx.y) is 
35 computed in accordance with the following equation 6 and the enhancement-layer value go'(x,y) is computed in ac- 
cordance with the following equation 7: 



40 giU,y) = i[f'(4x.y)+f\4x + I,y)+f\4x+2,y)+f'(4x + 3.y)] (6) 
forx = 0,1;y=:0 7. 

45 

go'(^' = {[^ '(2x, y) + f \2x + 1 . y) . f '(2x + 2. y ) - f \2x + 3, y )] (7) 
forx = 0,2;y = 0 ,7. 

so [0025] In the interlaced-scan case of an 8x8 block, gV(x,y) in equation 6 defines the average value of the values of 
a set of 4 contiguous pixels (or prediction errors) arranged in a 4x 1 block portion of the 8x8 block. The value gb'(x,y) 
in equation 7 defines the difference between the average value of the values of a first set of 2 contiguous horizontal 
pixels (or prediction enters) of a vertical line and the average value of the values of a second set of the next 2 contiguous 
horizontal pixels (or prediction errors) of the same vertical line arranged in a 4x1 block portion of an 8x8 block. The 16 

55 equations gi(0,0) to g-|(1 ,7) of the above "Interlaced-Scan, Base-layer Set of Decimated Pixel Values" were derived 
by substituting equation 3 into equation 6, substituting numeric values for x and y in g|*(x,y), substituting N = 8, and 
approximating the weighting factors for the DCT coefficients with rational values. The 16 equations go(0,0) to go(2,7) 
of the above "interlaced-Scan, Enhancement-layer Set of Decimated Pixel Values" were derived in a similar manner 
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by substituting equation 3 into equation 7, substituting numeric values for x and y in gb'(x,y), substituting N = 8, and 
approximating the weighting factors for the DCT coefTicients with rational values. Although the effective pixel decimation 
of the enhancement-layer is only 2 (rather than the effective pixel decimation of 4 of the base-layer), the equalities go 
(x+1 ,y) = -go(x,y) hold for x = 0 and x = 2 so that enhancement-layer values with odd horizontal indexes need not be 
5 computed. Thus, only 16 independent go(x,y) enhancement-layer values need be computed for each 8x8 luma block 
in an interiaced-scan I or P picture. Further, because these 16 go(x,y) enhancement-layer values are residual values, 
they tend to have a small dynamic range. 

[0026] Returning to FIGURE 2, unit 204 conveys an output comprising successive 8x8 blocks of I, P and B luma and 
chroma gi(x,y) base-layer decimated pixel values as a first input to base-layer adder 205B in a predetermined order. 

10 (For non-coded blocks all such values are zero). This predetermined order includes the decimated pixel values of each 
2x2 array of 8x8 pixel luma blocks and each of two chroma blocks which form a decimated macrobiock for use by 
enhanced MCU processing means 208. Further, unit 208 applies a corresponding block p^ (x,y) of base-layer decimated 
pixel values as a second input to base-layer adder 2058 in this same predetermined order (For Intracoded macroblocks 
all such values are zero). The block Si(x,y) of base-layer decimated pixel values derived as a sum output from base- 

15 layer adder 205B are then stored in memory 206. 

[0027] Unit 204 conveys an output comprising the 1 and P luma go(x,y) enhancement-layer decimated pixel values 
as a first input to enhancement-layer adder 205E in the previously mentioned declmated-pixel macrobiock predeter- 
mined order. (For non-coded blocks all such values are zero). Further, for the case of P luma pixels, unit 208 applies 
a conresponding macrobiock of 64 Po(x,y) enhancement-layer decimated pixel values as a second input to adder 205E 

20 in this same predetermined order. (For intracoded macroblocks all such values are zero). The macrobiock of 64 So(x, 
y) enhancement-layer decimated pixel values derived as a sum output from adder 205E are applied as an input to 
enhancement-layer encoder 207 and then the encoded output bit-words from encoder 207 are stored in memory 206 
during decoding of I and P pictures. 

[0028] A macrobiock at the higher resolution of the enhancement-layer would nomrially comprise 128 decimated 
25 luma pixel values. However, because of the above-described symmetry equalities for both progressive-scan sequences 
and interiaced-scan sequences, the number of independent decimated enhancement-layer pixel values in the block 
So(x,y) is reduced from 128 to 64. Therefore, the predetemnined order is such that only half of the enhancement-layer 
decimated pixel values need be considered by enhancement-layer encoder 207. These enhancement-layer values are 
encoded in pairs using a simple vector quantizer, with each pair of values being represented by an 8-bit codeword. 
30 Since there are 64 enhancement-layer values to be encoded in a macrobiock, the number of bits of storage for the 
enhancement layer is 32x8 = 256 bits per macrobiock. In the preferred embodiment the 32 codewords are combined 
into two 128-bit output words from encoder 207 for storage in memory 206. 

[0029] For progressive sequences each pair of horizontally adjacent values in the block So(x,y) is encoded as a two- 
dimensional vector, whereas for interiaced sequences each pair of vertically adjacent (within the same field) values in 
So(x,y) is encoded as a two-dimensional vector. Let Vq and v^ be a pair of values to be encoded together. The compu- 
tational procedure employed by encoder 207 to encode the pair Vq, v^ is described in detail in Appendix A. After this 
procedure has been completed for each pair of values in SQ(x,y), the codewords are packed into two 128-bit words, 
both of which 128-bit words form the output from encoder 207 that are stored in memory 206. Returning again to 
FIGURE 2, memory 206 provides (1) a base-layer output di(x,y) to unit 208 (di(x,y) is similar in content to the base- 
40 layer input s^ (x,y) provided to memory 206) and (2) an enhancement-layer output to unit 208 (similar in content to the 
enhancement-layer input to memory 206). 

[0030] In order for enhanced MCU processing means 208 to form a block of predictions, a block of pixel values is 

fetched from memory 206. The base-layer of pixel values which are read from the stored reference picture are denoted 
di(x,y). The enhancement-layer residual values, which are needed only if the block of predictions being formed is for 

45 the luma component In a P picture, are denoted do(x,y). Since the enhancement-layer samples are stored in memory 
206 in encoded form, the enhancement-layer data output from memory 206 input to unit 208 is decoded by enhance- 
ment-layer decoder 300 (Figure 3) to obtain the do(x,y) values. Unit 208 separately forms individual luma or chroma 
outputs for field prediction operations corresponding to the top and bottom field prediction blocks. In a bi-directionally 
predicted macrobiock these operations are performed separately for the forward and backward predictions and the 

50 results are combined as described in the ISO 1 3818-2 standard. In the following detailed description of the computa- 
tional-processing operations performed by unit 208, the symbol / represents integer division with truncation of the result 
toward minus infinity, the symbol // represents integer division with truncation of the result toward zero, and the symbol 
% represents the modulus operator, which is defined such that if x is a negative number and M is a positive number, 
then x%M=M-((x//M)*M-x). 

55 [0031] Before a block of samples can be read from memory 206, the location and size of the block is determined. 
The location of a block of pixel values in the reference picture is specified by the horizontal and vertical coordinates of 
the start (i.e., the upper-left corner) of the block in the reference picture. For the base-layer, these coordinates are 
indexes into a picture which is 1/4 horizontal, full vertical resolution for interiaced sequences and 1/2 horizontal, 1/2 
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vertical resolution for progressive sequences. For the enhancement-layer, the coordinates are indexes into a picture 
which is 1/2 horizontal, full vertical resolution for both interlaced and progressive sequences. 
[0032] To locate the blocks 6^(x,y) and do(x,y) in the reference picture, the motion vector for the macroblock being 
decoded Is needed. The decoding of motion vector data In the bitstream, the updating of motion vector predictors, and 

5 the selection of motion vectors in non-intra macroblocks which contain no coded motion vectors (e.g., skipped mac- 
roblocks) are all performed by unit 208 as described in the ISO 13818-2 standard. Let x^^ and y^ be the full-resolution 
horlzontal and vertical positions of the macroblock being decoded and let mv=(dx,dy) be the decoded motion vector, 
so that if the sequence were being decoded at full resolution, a block of pixel values at location (x^+{6x/2), yi3+(dy/2)) 
in the full-resolution reference luma picture would be read from memory and used to form luma predictions. Similarly, 

10 a block of chroma values at location (Xb/2+(dx//2), y5/2+(dy//2)) in the reference chroma picture would be needed to 
form predictions for each of the 2 chroma components in a full-resolution mode. 

[0033] The location in the reference picture of a block needed for motion compensation in unit 208 is determined 
using x^, y^, dx and dy. Table 3, shows the locations of blocks for various prediction modes. The sizes of the blocks 
needed for motion compensation in unit 208 are specified in Table 4. Base-layer entries in Table 4 give the size of the 
IS block d^(x,y). and enhancement-layer entries in Table 4 give the size of the block do(x,y). 



Table 3- 



Locations of Blocks Needed for Motion Compensation in Enhanced l\^CU Processing IVIeans 208 


Prediction IVIode 


Horizontal Coordinate 


Vertical Coordinate 


Progressive sequence, luma, base-layer 


((xi,+(dx/2))/8)*4 


(yb+(dy/2))/2 


Progressive sequence, luma, enhancement-layer 


((Xb+(dx/2))/8)*4 


((yb*(dy/2))/2)*2 


Progressive sequence, chroma 


Xb/4+((dx//2)/4) 


yb/4+((dy//2y4) 


interfaced sequence, luma, base-layer 


((Xb+(dx/2))/8)*2 


yb+(dy/2) 


Interlaced sequence, luma, enhancement-layer 


((Xi,+(dx/2)V8)*4 


yb+(dy/2) 


Interlaced sequence, chroma 


Xb/8+((dx//2)/8) 


yb/2+((dy//2)/2) 



30 



Table 4- 



Sizes of Blocks Needed for Motion Compensation in Enhanced MCU Processing Means 208 


Prediction Mode 


Horizontal Size 


Vertical Size 


Progressive sequence, iuma.base-layer 


12 


9 


Progressive sequence, luma.enhancement-layer 


12 


18 


Progressive sequence, chroma 


5 


5 


Interlaced sequence, luma, 16x16 prediction.base-layer 


6 


17 


Interlaced sequence, luma, 16x8 prediction, base-layer 


6 


9 


Interlaced sequence, luma, 16x16 prediction, enhancement-layer 


12 


17 


Interlaced sequence, luma, 16x8 prediction, enhancement-layer 


12 


9 


Interlaced sequence, chroma, 8x8 prediction 


3 


9 


Interlaced sequence, chroma, 8x4 prediction 


3 


5 



[0034] Figure 3 shows the processing perfomfied on the luma samples read from memory 206 by unit 208. As shown 
in Figure 3, the luma-processing portion of unit 208 comprises enhancement-layer decoder means 300, enhancement- 
layer pixel reconstruction means 302. DCT-based upsample means 304. full-resolution block select means 306, DCT- 
based downsample means 308 and two-layer output formation means 310. These elements of enhanced MCU process- 
ing means 208 use the reduced-resolution blocks from I and P frames stored in memory 206 to form predictions for a 
decoded macroblock. 

[0035] The above-described structure of Figure 3 performs computational processing of luma pixel values input to 
unit 208 from memory 206. This computational process is described in detail in appendices B to G. Briefly, however, 
decoder 300 unpacks the input 128-bit words into 16 constituent 8-bit codewords. Decoder 300, employing the com- 
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putattonal processing described in appendix B. derives do(x,y) as an output. Enhancement-layer pixel reconstruction 
means 302, employing the computational processing described in appendix C. derives ro(x,y) as an output in response 
to both the di(x,y) input to unit 208 and the do(x,y) output from decoder 300. DGT-based upsample means 304, em- 
ploying the computational processing described in appendix D, horizontally upsamples the ro (x,y) input to derive r(x, 

5 y) at full resolution. Full-resolution block select means 306, employing the computational processing described in ap- 
pendix E, uses r(x,y) to derive a full-resolution block of predictions p(x,y) as an output. DCT-based downsampte means 
308, employing the computational processing described in appendix F, horizontally downsamples the p(x,y) input to 
derive q(x,y) at half horizontal resolution. The block q(x,y) is applied as an input to two-layer output formation means 
310, which employs the computational processing described in appendix G, to derive the outputs Pi(x,y) and Po(x,y) 

10 provided by unit 208 to adders 205B and 205E shown in FIGURE 2. Although the computational processing required 
for chroma predictions are not shown in FIGURE 3, they are described in detail In appendix H. 
[0036] Returning again to FIGURE 2, a video signal comprising the base-layer pixels defining each of successive 
picture fields or frames is output from memory 206 and input to sample-rate converter 210 which derives a display 
video signal output. The display video signal output from unit 210 represents a PIP display image. By way of example, 

15 assume that the size of a PIP display is intended to occupy 1/3 of the horizontal dimension and 1/3 of the vertical 
dimension of the entire HD display size. If the original resolution of the HD bit-stream is 1920x1080 interiaced, then 
the PIP decoded frames (in which the number of pixels in the horizontal direction has been decimated by a factor of 
3/4) are 480x1080 interiaced. Assuming a 1920x1080 interiaced HD display, the displayed PIP frames should be 
640x360 interiaced. Therefore, in this example, the decoded frames stored in memory must be scaled by sample-rate 

20 converter 210 by a factor of 4/3 in the horizontal direction and 1/3 in the vertical direction. 

[0037] In a realized embodiment of the present invention, the extra capacity required in 1 memory for storage of the 
1/2 resolution enhancement-layer In encoded form adds only 1.98 Mbits to the 17.8 Mbits required for storage of the 
1/4 resolution base-layer. Thus, the inclusion of an encoded 1/2 resolution enhancement-layer increases the needed 
storage capacity of the base and enhancement-layer decimated-pixel memory by a relatively small amount (i.e., only 

25 a little more than 11 %) to 1 9.78 Mbits. 
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APPENDIX A 
Enhancement-layer Encoder 207 

In the following procedure, used to encode each pair of enhancement-layer 
values vo,Vi. the symbol "DIV** represents integer division with rounding of the result 
to the nearest integer. 

First, the values vq and V| are clipped to the range [-45,45], Then the 8-bit 
codeword C is computed as shown in the following pseudocode: 

if (vo > -4 AND vo < 4 AND vi > -4 AND Vi <4) 
C = 7*(vi+3) + vo + 211 

else 

if (vi <vo-25) 

Vo =! (vo + vi + 25)/2 

V| = Vo - 25 
else if (vi > vo + 25) 

Voa(Vo+Vi-25)/2 

Vi = Vo + 25 
vo = 5*(voDIV5) 
Via5*(V|DIV5) 
C = 104 - 2*vi - vo/5 

The codeword C is the value stored in memory to represent the pair vo,V|. 



EP1 054 566A1 



APPENDIX B 
Enhancement-layer Decoder 300 

After a 128-bit enhancement-layer word read out from base and enhancement- 
layer memory 206 has been unpacked into 16 separate 8-bit codewords, each codeword 
can be decoded as described below. Let C be a codeword to be decoded and let bo and 
bi be the values to be obtained by decoding C. Then the following pseudocode shows 
how bo and bi should be computed: 

if(C<204) 

bo = 70-5*(ail).5*(C%ll) 
bi=45-5*(an) 

else 

bo = (C-208)%7-3 
• bi=(C-208)/7-3 

Each decoded codeword is used to fill in a portion of do(x,y). For progressive 
sequences, lines of do(x,y) which come from odd-indexed lines in the reference picture 
are obtained by negating the appropriate decoded codeword values. Similarly, for 
interlaced sequences, columns of do(x,y) which come from odd-indexed columns in the 
reference picture are obtained by negating the appropriate decoded codeword values. 
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APPENDIX C 
Enhancement-layer Pixel Reconstruction Means 302 

Using the block do(x,y) output from enhancement-layer decoder 300 and block 
d|(x,y) read out from base and enhancement*layer memory 206, the pixel values at the 
enhancement-layer resolution (i.e., Vi-horizontal, full vertical resolution) can be 
reconstructed for both P pictures and B pictures. For P pictures do(x,y) is obtained as 
described in Appendix B; for B pictures do(x,y) is assumed to be 0 for all x and y. 

The equations used to combine the 2 layers for progressive sequences are given 

by 

ro(x.2y) = di(x,y) + do(x.2y) 
ro(x,2y + 1) = dt(x,y)+ do(x,2y + 1) 

forx=sO,..,,ll,y = 0,...,8. 

The equations used to combine the 2 layers for interlaced sequences are given 

by 

ro(2x, y) = di(x, y) + do(2x,y) 
ro(2x + 1 ,y) = di(x,y) + do(2x + 1 ,y) 

where x = 0,,.m5; y = O,.,.,? for 16x8 prediction and x = 0,...,5; y = 0,...,16 for 16x16 
prediction. 

The block ro(x«y), obtained using the above equations, contains the 
enhancement-layer pixels at one-half horizontal resolution, so that the enhancement- 
layer pixels still need to be upsampled by a factor of 2 horizontally to reach full 
resolution. 
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APPENDIX D 
DCT-Based Upsample Means 304 

DCT-based upsample means 304 linearly transfomis the 12 input pixel values 
in each row of ro(x,y), to 24 output pixel values representing each row of r(x,y) at full 
resolution. As shown in FIGURE 4, the 12 pixel values in a row are computationally 
processed in 3 groups of 4, where the 12 input pixel values are divided into 3 groups 
Wi, W2 and W3. For each of the 3 groups, the DCT is a 4-point DCT, followed by a 
zero pad that adds 4 zeros to the end of that 4'point DCT output to result in an 8-point 
output from that zero pad. The IDCT for each of the 3 groups is an 8-point IDCT, 
which results in the 24 output pixel values representing each row of r(x,y) at full 
resolution comprising 3 groups Z|, Z2 and Za, wherein the output pixel values for each 
of the 3 groups consists of 8 pixel values. 

The linear transformation shown in Figure 4 can be computationally 
implemented as 3 matrix-vector multiplications, given by the equations Zj (l/64)AWj 
forj = 1,2, and 3, where 

18 8 -21 
23 -9 3 I 
60 -11 3 I 
56 18 -4 
18 56 -6 
•11 60 12 I 
-9 23 47 I 
8 -18 76 J 

The result of performing this linear transformation on each row of input pixel 
values in ro(x,y) is the block of r(x,y) output pixel values at full resolution. 



I 
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I 12 
' -6 



-4 
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APPENDIX E 
Full-Resolution Block Select Means 306 



A final 16x16 or 16x8 full-resolution block of predictions, denoted p(x,y), must 
be obtained from r(x,y), using the motion vector (dx,dy) to select a subset of the pixels 
in r(x,y), with half-pixel interpolation if necessary. More specifically, if the upper-left 
pixel in r(x,y) has coordinates (0,0) and the coordinates of r(x,y) are specified with 
half-pixel precision, then the upper left comer of the final prediction block is 
r(dx%I6,dy%4) for progressive sequences and r(dx%16,dy%2) for interlaced 
sequences. Since the coordinates of r(x,y) are specified with half-pixel precision, odd 
values of dx%M or dy%M (for M = 2,4,16) imply that half-pixel interpolation, as 
described in the ISO 13818-2 standard, must be performed. 
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APPENDIX F 
DCT-Based Downsample Means 308 



DCT-based downsample means 308 linearly transforms each row of p(x,y) from 
16 pixels to 8 pixels. As shown in FIGURE 5, the 16-pixel row is first divided into two 
groups of 8 pixels, U| and U2. For each of the 2 groups, the DCT is an 8-point DCT, 
followed by a truncation which discards the last 4 points in the DCT. The IDCT for 
each of the 2 groups is a 4-point IDCT, which results in the 8 output pixel values 
representing each row of q(x,y) at one-half resolution comprising 2 groups Vi and V2, 
wherein the output pixel values for each of the 2 groups consists of 4 pixel values. The 
computationally-processed output of DCT-based downsample means 308 shown in 
FIGURE S is a set of 8 pixels, formed from the concatenation of V| and V2. 

The linear transformation shown in Figure 5 can be computationally 
implemented as 2 matrix-vector multiplications, given by the equations Zj s BUj for j = 
1 and 2, where 

[38 
' -9 
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The result of performing this linear transformation on each row of input pixel 
values in p(x,y) at full resolution is the block of q(x,y) output pixel values at one-half 
horizontal resolution. 
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APPENDIX G 
Two-Layer Output Formation Means 310 

In the following procedure, the symbol ''DIV'* represents integer division with 
rounding of the result to the nearest integer. The base-layer output of the MCU, 
denoted pi(x,y), is obtained for progressive sequences using the equation 

Pi(x.y) ^ (q(x,2y) + q(x,2y+l)) DIV 128 

for X = 0,..,.,7; y = 0 7. 

The base-layer output of the MCU, denoted pi(x,y), is obtained for interlaced 
sequences using the equation 

Pi(x,y) = (q(2x,y) + q(2x + Uy)) DIV 128 

for X s 0, 3; y = 0 7 for 16x8 prediction and x = 0 ,3; y = 0 ,15 for 16x16 

prediction. 

The enhancement-layer output of the MCU, denoted po(x,y) and computed only 
for P pictures, is obtained for progressive sequences using the equation 

Po(x,y) = (q(x.y) - q(x.y + 1)) DIV 128 

for X = 0 ,7; y = 0,2,4 12,14. 

The enhancement-layer output of the MCU, denoted po(x,y) and computed only 
for P pictures, is obtained for interlaced sequences using the equation 

Po(x,y) = (q(x,y) - q(x + l,y)) DIV 128 

for X = 0,2,4,6; y = 0 ,7 for 16x8 prediction and x = 0,2,4,6; y = 0 15 for 16x16 

prediction. 

As shown in FIGURE 2. the MCU base-layer output pi(x,y) is applied as an 
input to adder 205B and the MCU enhancement-layer output po(x,y) is applied as an 
input to adder 205E. 
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APPENDIX H 
Forming Block of Predictions for the Chroma Component 

In the following procedure, the symbol "DIV" represents inte^r division with 
rounding of the result to the nearest integer. If the block of predictions being formed is 
for the chroma component, then the samples di(x,y) are upsampled to full resolution 
using a pixel repeat operation. Then the flnal full resolution 8x8 or 8x4 block of 
predictions is selected using the motion vector (dx,dy). with half-pixel interpolation, if 
necessary. Finally, the full-resolution predictions are downsampled to provide base- 
layer predictions to be stored in base and enhancement-layer memory 206. 

The pixel repeat operation, which transforms the base-layer block di(x,y) to the 
full-resolution block r(x,y), is performed for progressive sequences using the equation 

r(x,y) = d,(x/2,y/2) 

for X = 0,....,9; y = 0, 9. 

The pixel repeat operation, which transforms the base-layer block di(x,y) to the 
full-resoluti(X) block r(x,y), is performed for interlaced sequences using the equation 

r(x.y)sdi(x/4,y) 

for X = 0 1 1 ; y = 0,...,4 for 8x4 prediction and x = 0...., 1 1; y « 0.....8 for 8x8 

prediction. 

The fmal 8x8 or 8x4 full-resolution block p(x,y) is obtained from r(x,y). The 
upper-left comer of this fmal block is r((dx//2)%4,(dy//2)%4) for progressive 
sequences and r((dx//2)%8,(dy//2)%2) for interlaced sequences. Odd values of 
(dx//2)%M or (dy//2)%M (for M = 2,4,8) imply half-pixel interpolation is necessary. 

The base-layer prediction block pi(x,y) is obtained by downsampling p(x,y). 
For progressive sequences the equation used to perform this downsampling is 

Pi(x,y) = (p(2x,2y) + p(2x + l,2y) + (p(2x.2y + 1) + p(2x + 1.2y + 1)) DIV 4 

for x = 0, 3; y = 0,....,3. For interlaced sequences the equation used to perform this 

downsampling is 



pi(x,y) = (p(4x,y) + p(4x + l,y) + p(4x + 2.y) + p(4x + 3,y)) DIV 4 
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for X = 0, 1 ; y = 0,...,3 for 8x4 prediction and x =s 0,1 ; y = 0, 7 for 8x8 prediction. 

As shown in FIGURE 2, the MCU base-layer output pi(x.y) is applied as an 
s input to adder 20SB. 
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Claims 

1. Apparatus for decoding compressed image data including frequency domain coefficients defining blocks of pixel 
values representing an image at a first resolution to provide an image at a reduced second resolution for display, 

15 said apparatus characterized by: 

first means (102) responsive to a selected sub-set of said frequency domain coefficients for deriving said 
image of said reduced second resolution for display and including, 
enhanced motion-compensation-unit (MCU) processing means (208); and 
20 second means (204, 205B, 205E, 206, 207) for operating said enhanced MCU processing means with bloclcs 

of pixel values representing said image at an intermediate third resolution lower than said first resolution and 
higher than said reduced second resolution. 

2. The apparatus defined in Claim 1 , characterized in that said reduced second resolution is substantially 1/4 of said 
25 first resolution; and said second means operates said enhanced MCU processing at an intermediate third resolution 

which is substantially 1/2 of said first resolution. 

3. The apparatus defined in Claim 1 , characterized in that said image at said reduced second resolution for display 
is a progressive-scanned image. 

30 

4. The apparatus defined in Claim 1 . characterized in that said image at said reduced second resolution for display 
is an interlaced-scanned image. 

5. The apparatus defined in Claim 1 , characterized in that: 

35 said enhanced MCU processing means is responsive to base-layer pixel macrobiock input values represent- 

ing said Image at said reduced second resolution and to pixel values representing said image at said intermediate 
third resolution for deriving motion-compensated base-layer prediction macrobiock output pixel values as a first 
output and motion-compensated enhancement-layer prediction macrobiock output pixel residual values as a sec- 
ond output. 

40 

6. The apparatus defined in Claim 5, characterized in that: 

said second means comprises third means responsive to said selected sub-set of said frequency domain 
coefficients and to both said motion-compensated base-layer macrobiock output pixel values and said enhance- 
ment-layer macrobiock output pixel residual values for deriving both said base-layer macrobiock input pixel values 
45 and said encoded enhancement-layer macrobiock input pixel residual values. 

7. The apparatus defined in Claim 1 , characterized in that said second means comprises: 

a base and enhancement-layer declmated-pixel memory; 
50 unitary enhanced inverse discrete cosine transform (IDCT), filtering and pixel-decimation processing means 

responsive to a selected sub-set of frequency domain coefficients for deriving base-layer blocks of output pixel 
values representing said image at said reduced second resolution as a first output and output enhancement- 
layer blocks of output pixel residual values representing said image at said intermediate third resolution as a 
second output; 

55 fourth means, including a first adder for adding con-esponding pixel values of said motion-compensated base- 

layer macrobiock output pixel values from said enhanced MCU processing means and said base-layer blocks 
of output pixel values from said unitary IDCT, filtering and pixel-decimation processing means, for deriving 
values that are stored as base-layer data in said base and enhancement-layer decimated-pixel memory; 
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* fifth means, including a second adder and an enhancement-layer encoder, for adding corresponding pixel 
residual values of said motion-compensated enhancement-layer macroblock output pixel residual values from 
said enhanced MCU processing means to said enhancement-layer blocks of output pixel residual values from 
said unitary IDCT, filtering and pixel-decimation processing means to obtain a sum output from said second 

5 adder for encoding by said enhancement-layer encoder, for deriving second input values that are stored as 

encoded enhancement-layer data in said base and enhancement-layer decimated-pixel memory; and 
sixth means for providing from said base and enhancement-layer decimated-pixel memory said base-layer 
pixel macroblock Input values to said enhanced MCU processing means and for deriving said encoded en- 
hancement-layer pixel macroblock input residual values applied as a second input to said enhanced MCU 

10 processing means from said stored encoded enhancement-layer data. 

8. The apparatus defined in Claim 1 , characterized in that: 

said frequency domain coefficients define image Information that includes luma blocks of pixel values rep- 
resenting intracoded (I) and predictive-coded (P) progressive-scanned images at said first resolution. 

15 

9. The apparatus defined in Claim 7, further characterized by: 

seventh means comprising a sample-rate converter for deriving an ongoing display video signal from base- 
layer blocks of output pixel values. 

20 10. The apparatus defined in Claim 1 , characterized in that: 

said reduced second resolution is substantially 1/4 of said first resolution; and 
said intermediate third resolution is substantially 1/2 of said first resolution. 

25 11. In a system for decoding compressed image data in the form of pixel blocks representing an image of a first 
resolution to provide an image of a reduced second resolution, a method characterized by the steps of: 

generating data representative of an image pixel block at an intermediate third resolution lower than said first 
resolution but higher than said reduced second resolution; 
30 generating motion compensated pixel block data at said third resolution from pixel block data of said reduced 

second resolution supplemented by said intermediate third resolution data; and 

deriving pixel data representing said image of said reduced second resolution from said motion compensated 
pixel block data at said third resolution. 

35 12. A method according to claim 11 characterized in that the steps of claim 18 are performed for P frames exclusively 
of 1 and B frames. 

13. A method according to claim 11 characterized in that the steps of claim 18 are performed for P frames and one 
of, (a) I frames and (b) B frames. 

40 

14. A method according to claim 11 further characterized by the step of 

upsampiing said pixel block data at said third resolution to provide image data of said first resolution. 

15. A method according to claim 14 further characterized by the step of 

45 downsampiing said upsampled pixel block data of said first resolution to provide image data of said second 

resolution. 

16. A method according to claim 14 further characterized by the step of 

downsampiing said upsampled pixel block data of said first resolution to provide said intermediate third res- 
50 olution data. 

17. A method according to claim 11 characterized in that 

said pixel block data of said third resolution comprises residual data. 



55 
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